Who feeds the world ? And how wealthy are they ?

There are folium maps in this notebook. If they do not display well, please have a look at the .html file from the same repository.

Abstract

Our main goal in this project is to find social and economic world-wide relations of countries based on the state of their agricultural sector, using indicators such as imports, exports, production, self-sufficiency, etc. In order to find such indicators, we would like to use the data from the "Global Food & Agriculture Statistics" datasets. First, we would like to produce a map showing which countries "feed the world" i.e. which countries are net-exporting food. That map would contain a slider to show how this evolved over the past fifty years. Then we would like to show countrywise the level of food self-sufficiency i.e. the way a country does not need to trade with other ones in order to feed its population. We will also compare it to nations' economic development and see if some correlations can be found. We will visualize our best findings with several interactive maps and plots.

Research questions

We would like to work on the following research questions:

  • How does the production and consumption of food look like from a geographical point of view ? Which countries are net food exporters or importers? How did this evolve over the last few decades ?
  • What's the level of self-sufficiency in food production of individual countries and how does this change over time ?
  • Is there a link between the GDP per capita and the agricultural trade balance ? Are countries that are net exporters or importers richer ? Are self-sufficient countries richer ?
  • If we find any relations, are they also still valid if we check for specific crops ? Are there some crops that are mostly produced by richer countries, some that are mostly produced by poorer countries?

External imports:

In [1]:
import pandas as pd
import numpy as np
import os
import matplotlib.pyplot as plt
import folium
import seaborn as sns
import json
import re
import requests
from bs4 import BeautifulSoup
from ipywidgets import interact
from IPython.display import display
import scipy.cluster.hierarchy as spc

Auxiliary function imports:

We have implemented some functions into a dedicated module (file Milestone_2_scripts.py) in order to simplify the code and make this notebook more enjoyable to read.

In [2]:
#from Milestone_2_scripts import *

Setup:

In [3]:
data_folder_path = "./Data/current_FAO/raw_files/"

files = {"Crops production" : "Production_Crops_E_All_Data_(Normalized).csv",
         "Crops trade" : "Trade_Crops_Livestock_E_All_Data_(Normalized).csv", 
         "Consumer price indices" : "ConsumerPriceIndices_E_All_Data_(Normalized).csv",
         "Macroeconomy" : "Macro-Statistics_Key_Indicators_E_All_Data_(Normalized).csv",
         "Livestock production" : "Production_Livestock_E_All_Data_(Normalized).csv",
         "Live animals trade" : "Trade_LiveAnimals_E_All_Data_(Normalized).csv"
        }
interesting_datasets = files.keys()

1.A. Dataset description

Our main dataset would be a subset of the "Global Food & Agriculture Statistics" that is found in the proposed datasets list. In this dataset, we have seen that we could work with the production as well as import and export quantities per year and per country. We will add information about countries GDP to this database.

1.B. Loading the data set

In [4]:
def load_datasets(datasets) :
    df = {}
    for dataset in datasets :
        file_path = data_folder_path + files[dataset]
        df[dataset] = pd.read_csv(file_path, encoding = "ISO-8859-1")
    return df

We load each interresting dataset in the dictionary df :

In [5]:
df = load_datasets(interesting_datasets)

1.C. Understanding the data set

In this part, we will have a first look of the datasets in order to get a first sense of the data.

In [6]:
def display_df(df, datasets):
    for dataset in datasets :
        display(dataset, df[dataset].sample(5))

In order to see what does the datasets look like, we display a sample of 5 rows for each of them :

In [7]:
display_df(df, interesting_datasets)
'Crops production'
Area Code Area Item Code Item Element Code Element Year Code Year Unit Value Flag
2404836 5501 Australia & New Zealand 401 Chillies and peppers, green 5312 Area harvested 2006 2006 ha 2787.0 A
2148444 5207 South America 1717 Cereals,Total 5419 Yield 2007 2007 hg/ha 36907.0 Fc
587432 238 Ethiopia 44 Barley 5419 Yield 2002 2002 hg/ha 14409.0 Fc
1832713 237 Viet Nam 780 Jute 5312 Area harvested 1976 1976 ha 14170.0 NaN
2038812 5200 Americas 44 Barley 5510 Production 1979 1979 tonnes 18150705.0 A
'Crops trade'
Area Code Area Item Code Item Element Code Element Year Code Year Unit Value Flag
6802620 143 Morocco 390 Juice, tomato 5910 Export Quantity 1982 1982 tonnes 0.0 NaN
10549752 227 Tuvalu 631 Waters,ice etc 5622 Import Value 1965 1965 1000 US$ NaN M
2906564 49 Cuba 571 Mangoes, mangosteens, guavas 5910 Export Quantity 1966 1966 tonnes NaN M
11823510 5000 World 1942 Eggs in The Shell 5610 Import Quantity 1982 1982 tonnes 823102.0 A
1718732 35 Cabo Verde 426 Carrots and turnips 5922 Export Value 1992 1992 1000 US$ NaN M
'Consumer price indices'
Area Code Area Item Code Item Months Code Months Year Code Year Unit Value Flag Note
17208 63 Estonia 23013 Consumer Prices, Food Indices (2010 = 100) 7012 December 2013 2013 NaN 186.20000 X 1997
25356 104 Ireland 23013 Consumer Prices, Food Indices (2010 = 100) 7005 May 2005 2005 NaN 102.10000 X 2016
38279 157 Nicaragua 23013 Consumer Prices, Food Indices (2010 = 100) 7004 April 2010 2010 NaN 154.16000 X 2006
21236 86 Grenada 23013 Consumer Prices, Food Indices (2010 = 100) 7008 August 2012 2012 NaN 113.62000 X 2010M1
52466 210 Sweden 23012 Consumer Prices, General Indices (2010 = 100) 7003 March 2010 2010 NaN 99.62228 X 2010
'Macroeconomy'
Area Code Area Item Code Item Element Code Element Year Code Year Unit Value Flag
361482 166 Panama 22015 Gross Fixed Capital Formation 6103 Share of GDP in US$ 2000 2000 % 21.185653 Fc
422412 197 Sierra Leone 22076 Value Added (Manufacture of food, beverages an... 6114 Value Local Currency, 2005 prices 1987 1987 millions NaN NaN
532049 5000 World 22078 Value Added (Manufacture of tobacco products) 6110 Value US$ 2014 2014 millions 67093.231345 A
99766 96 China, Hong Kong SAR 22015 Gross Fixed Capital Formation 6108 Value US$, 2005 prices 2010 2010 millions 45229.162934 XAM
271711 125 Liechtenstein 22015 Gross Fixed Capital Formation 6155 Annual growth Local Currency 1989 1989 % 9.767438 Fc
'Livestock production'
Area Code Area Item Code Item Element Code Element Year Code Year Unit Value Flag
80965 150 Netherlands 866 Cattle 5111 Stocks 1974 1974 Head 4666000.0 NaN
160678 5817 Net Food Importing Developing Countries 866 Cattle 5111 Stocks 1988 1988 Head 237296770.0 A
132332 5101 Eastern Africa 1016 Goats 5111 Stocks 1970 1970 Head 47918926.0 A
133830 5103 Northern Africa 1181 Beehives 5114 Stocks 2009 2009 No 2674229.0 A
23161 40 Chile 1749 Sheep and Goats 5111 Stocks 1962 1962 Head 7596100.0 A
'Live animals trade'
Area Code Area Item Code Item Element Code Element Year Code Year Unit Value Flag
108945 41 China, mainland 1921 Bovine, Animals 5908 Export Quantity 1992 1992 Head 166385.0 A
34609 16 Bangladesh 1882 Agricult.Products,Total 5622 Import Value 1961 1961 1000 US$ 131771.0 A
486931 215 United Republic of Tanzania 1057 Chickens 5622 Import Value 1976 1976 1000 US$ 3136.0 NaN
398234 244 Samoa 10 Total Merchandise Trade 5922 Export Value 1972 1972 1000 US$ 5011.0 NaN
409076 196 Seychelles 866 Cattle 5608 Import Quantity 1981 1981 Head 0.0 NaN

At first glance, our datasets seem very clean.

Each of our dataset contains a column "Year" and a column that is either named "Area" or "Country". This is a great news for us since we want to do a both geographical and time-related analysis.

The columns "Area" and "Country" both correspond to the country except that the "Area" may contains a group of country (e.g. "Eastern Europe").

1.D. Cleansing the data set

In this part, we will clean the datasets. The final goal is to produce one uniformized and normalized dataset on which we could work (see 1.F).

Such a cleaned dataset may look like this (in a very simplistic way):

Country | Year | GDP | Crops production | Livestock production

1.D.a. Removing unuseful data

In this section, we will create dataframes in df_useful which correspond to previous dataframes without the unuseful data.

In [8]:
df_useful = {}
1.D.a.i. Extracting GDP from the "Macroeconomy" dataset

The "Macroeconomy" dataset contains many different measures: Gross Fixed Capital Formation, Gross National Income, Value Added (Total Manufacturing), ... We are only interested in Gross Domestic Product. Therefore, we extract it Gross Domestic Product from the "Macroeconomy" dataset. In order to have uniformisation among values, we choose the US$ value. All of them have the same unit (millions US$) so we can drop the "Unit" column as well.

In [9]:
def extract_GDP(df):
    def selection_GDP(df):
        return df['Item']=='Gross Domestic Product'
    def selection_US_dollars(df):
        return df['Element']=="Value US$"
    def drop_columns(df):
        dropped_colmuns = ["Item Code", "Item", "Element Code", "Element", "Flag", "Year Code", "Unit"]
        return df.drop(columns = dropped_colmuns)
    return drop_columns(df[selection_GDP(df)&selection_US_dollars(df)])
In [10]:
df_useful["GDP"] = extract_GDP(df["Macroeconomy"])

We can have have a look at a sample of the extrated dataset:

In [11]:
display(df_useful["GDP"].sample(5))
Area Code Area Year Value
44038 255 Belgium 2000 2.379047e+05
201999 175 Guinea-Bissau 1991 6.494428e+02
383944 117 Republic of Korea 2005 8.981372e+05
400778 191 Saint Vincent and the Grenadines 1996 3.292766e+02
556569 5404 Western Europe 1987 2.952139e+06

And we can plot GDP in million US$ for different countries for the period 1970-2015:

In [12]:
select_switzerland = df_useful["GDP"]['Area']=='Switzerland'
select_france = df_useful["GDP"]['Area']=='France'
select_austria = df_useful["GDP"]['Area']=='Austria'
select_canada = df_useful["GDP"]['Area']=='Canada'
ax = df_useful["GDP"][select_switzerland].plot(x ='Year', y='Value', kind = 'line')
ax = df_useful["GDP"][select_france].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = df_useful["GDP"][select_austria].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = df_useful["GDP"][select_canada].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax.legend(["Switzerland", 'France', 'Austria', "Canada"])
Out[12]:
<matplotlib.legend.Legend at 0x1bd2a8ff5c8>

For dissolute or new countries, we have some Nan values (before appearing or after dissolution) as in this next example :

In [13]:
select_USSR = df_useful["GDP"]['Area']=='USSR'
select_russia = df_useful["GDP"]['Area']=='Russian Federation'
select_ukraine = df_useful["GDP"]['Area']=='Ukraine'
ax = df_useful["GDP"][select_USSR].plot(x ='Year', y='Value', kind = 'line')
ax = df_useful["GDP"][select_russia].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = df_useful["GDP"][select_ukraine].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax.legend(["USSR", 'Russia', 'Ukraine'])
Out[13]:
<matplotlib.legend.Legend at 0x1bd2af43ec8>
1.D.a.ii. Extracting crops harvested area, production, seed and yield from the "Crops production" dataset

We want to extract crops harvested area, production, seed and yield from the "Crops production" dataset. As all crops are not food crops, we request the World crops database to keep only the food crops.

In [14]:
def get_food_crops():
    #Return a list of crops categorized as food crops https://world-crops.com/food-crops/
    url="https://world-crops.com/food-crops/"
    r=requests.get(url,headers={"User-Agent": "XY"})
    soup=BeautifulSoup(r.text,'html.parser')
    elements_temp=soup.find_all('a',href=re.compile("^../"))
    elements=[el.text for el in elements_temp]
    
    #only 40 elements are displayed on each page->iterating on the total list
    for i in range(40,401,40):
        url_i=url+"?ss="+str(i)
        r=requests.get(url_i,headers={"User-Agent":"XY"})
        soup=BeautifulSoup(r.text,'html.parser')
        new_elements=soup.find_all('a',href=re.compile("^../"))
        elements+=[el.text for el in new_elements]
    return elements

def inclusive_search(string,elements):
    #returns true if the string can be found in elements. The search removes special characters from string in order to include more positive results
    string=string.lower()
    delimiters = ",", "(","&",")"," and "," "
    pattern = '|'.join(map(re.escape, delimiters))
    strings=list(filter(None,re.split(pattern,string)))
    found=False
    for s in strings:
        if s=="nes":
            continue
        for el in elements:
            found=(s in el.split())
            if found==False and s[-1]=="s":
                found=s[:-1] in el.split()
            if found==False and s[-2:]=="es":
                found=s[:-2] in el.split()
            if found==False and s[-3:]=="ies":
                found=s[:-3]+"y" in el.split()
            if found==True:
                return found
    return found


def get_food_crop_data(df):    
    #extracts the food crop data, returns 4 df: Area,Production,Seed and yield    
    df=df.copy()
    food_crops=list(map(lambda x: x.lower(),get_food_crops()))              
    crop_types_df=df[['Item','Value']].groupby('Item').sum()
    crop_types_df=crop_types_df[list(map(lambda x : inclusive_search(x,food_crops) , crop_types_df.index ))]   
    food_crop_df=df[df.Item.apply(lambda x: x in crop_types_df.index)]
    return (food_crop_df[food_crop_df.Element=='Area harvested'],
            food_crop_df[food_crop_df.Element=='Production'],
            food_crop_df[food_crop_df.Element=='Seed'],
            food_crop_df[food_crop_df.Element=='Yield'])
  
food_crop_area_df , food_crop_production_df , food_crop_seed_df , food_crop_yield_df = get_food_crop_data(df["Crops production"])
In [15]:
df_useful['Crops Area harvested'] = food_crop_area_df.drop(columns=['Item Code', "Element Code", "Element", "Year Code", "Flag"])
df_useful['Crops Production'] = food_crop_production_df.drop(columns=['Item Code', "Element Code", "Element", "Year Code", "Flag"])
df_useful['Crops Seed'] = food_crop_seed_df.drop(columns=['Item Code', "Element Code", "Element", "Year Code", "Flag"])
df_useful['Crops Yield'] =  food_crop_yield_df.drop(columns=['Item Code', "Element Code", "Element", "Year Code", "Flag"])

We check everything is fine by looking at samples for each of the new dataframes:

In [16]:
display(df_useful['Crops Area harvested'].sample(5))
display(df_useful['Crops Production'].sample(5))
display(df_useful['Crops Seed'].sample(5))
display(df_useful['Crops Yield'].sample(5))
Area Code Area Item Year Unit Value
428316 48 Costa Rica Watermelons 2007 ha 1020.0
1689586 222 Tunisia Broad beans, horse beans, dry 1967 ha 54000.0
1477146 195 Senegal Cabbages and other brassicas 1988 ha 500.0
1465948 244 Samoa Vegetables&Melons, Total 1965 ha 120.0
662068 79 Germany Cranberries 1998 ha NaN
Area Code Area Item Year Unit Value
67566 1 Armenia Pears 2011 tonnes 23039.0
945720 114 Kenya Plums and sloes 1986 tonnes NaN
1437610 184 Rwanda Plantains and others 2011 tonnes NaN
2258401 5305 Western Asia Beans, green 2008 tonnes 651853.0
2386791 5500 Oceania Lentils 1993 tonnes 8018.0
Area Code Area Item Year Unit Value
2105255 5204 Central America Wheat 1985 tonnes 138764.0
2450408 5706 European Union Cow peas, dry 2000 tonnes 13.0
773407 95 Honduras Wheat 2013 tonnes 193.0
547028 58 Ecuador Roots and tubers, nes 1974 tonnes 4600.0
1780879 231 United States of America Rapeseed 1972 tonnes 0.0
Area Code Area Item Year Unit Value
2338327 5403 Southern Europe Bananas 1994 hg/ha 376144.0
362080 41 China, mainland Pepper (piper spp.) 1970 hg/ha 2500.0
2561314 5815 Low Income Food Deficit Countries Spinach 1980 hg/ha 53215.0
982902 121 Lebanon Grapes 2011 hg/ha 86974.0
1286696 166 Panama Fruit, fresh nes 2000 hg/ha 72000.0

We also make some plots to have a first understanding of the dataset:

In [17]:
select_Maize = df_useful['Crops Area harvested']['Item']=='Maize'
maize_df = df_useful['Crops Area harvested'][select_Maize]

select_switzerland = maize_df['Area']=='Switzerland'
select_france = maize_df['Area']=='France'
select_austria = maize_df['Area']=='Austria'
select_canada = maize_df['Area']=='Canada'
ax = maize_df[select_switzerland].plot(x ='Year', y='Value', kind = 'line')
ax = maize_df[select_france].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = maize_df[select_austria].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = maize_df[select_canada].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax.legend(["Switzerland", 'France', 'Austria', "Canada"])
Out[17]:
<matplotlib.legend.Legend at 0x1bd2ca63308>
In [18]:
select_USSR = maize_df['Area']=='USSR'
select_russia = maize_df['Area']=='Russian Federation'
select_ukraine = maize_df['Area']=='Ukraine'
ax = maize_df[select_USSR].plot(x ='Year', y='Value', kind = 'line')
ax = maize_df[select_russia].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = maize_df[select_ukraine].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax.legend(["USSR", 'Russia', 'Ukraine'])
Out[18]:
<matplotlib.legend.Legend at 0x1bd2cafac08>
1.D.a.iii. Extracting stocks production from the "Livestock production" dataset

We want to extract stocks production from the "Livestock production" dataset. Again, we drop the columns that are useless for us and have a first look of the data with a sample and some plots.

In [19]:
selection_stocks = df['Livestock production']["Element"] == 'Stocks'
df_useful['Livestock production'] = df['Livestock production'][selection_stocks].drop(columns=['Item Code', "Element Code", "Element", "Year Code", "Flag"])
In [20]:
display(df_useful['Livestock production'].sample(5))
Area Code Area Item Year Unit Value
14537 20 Botswana Sheep and Goats 1975 Head 880000.0
36429 58 Ecuador Horses 1993 Head 510000.0
128745 251 Zambia Pigs 2013 Head 1098951.0
110066 209 Swaziland Chickens 1996 1000 Head 980.0
156057 5706 European Union Turkeys 1969 1000 Head 17768.0
In [21]:
select_pigs = df_useful['Livestock production']['Item']=='Pigs'
pigs_df = df_useful['Livestock production'][select_pigs]

select_switzerland = pigs_df['Area']=='Switzerland'
select_france = pigs_df['Area']=='France'
select_austria = pigs_df['Area']=='Austria'
select_canada = pigs_df['Area']=='Canada'
ax = pigs_df[select_switzerland].plot(x ='Year', y='Value', kind = 'line')
ax = pigs_df[select_france].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = pigs_df[select_austria].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = pigs_df[select_canada].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax.legend(["Switzerland", 'France', 'Austria', "Canada"])
Out[21]:
<matplotlib.legend.Legend at 0x1bd2d58a608>
In [22]:
select_USSR = pigs_df['Area']=='USSR'
select_russia = pigs_df['Area']=='Russian Federation'
select_ukraine = pigs_df['Area']=='Ukraine'
ax = pigs_df[select_USSR].plot(x ='Year', y='Value', kind = 'line')
ax = pigs_df[select_russia].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = pigs_df[select_ukraine].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax.legend(["USSR", 'Russia', 'Ukraine'])
Out[22]:
<matplotlib.legend.Legend at 0x1bd2c43e0c8>
1.D.a.iv. Extracting import and export quantities from the "Live animals trade" and "Crops trade" datasets

Now, we extract import and export quantities from the "Live animals trade" and "Crops trade" datasets, having again some samples and some plots.

In [23]:
selection_import_quantities = df['Live animals trade']["Element"] == 'Import Quantity'
selection_export_quantities = df['Live animals trade']["Element"] == 'Export Quantity'

df_useful['Live animals import quantities'] = df['Live animals trade'][selection_import_quantities].drop(columns=['Item Code', "Element Code", "Element", "Year Code", "Flag"])
df_useful['Live animals export quantities'] = df['Live animals trade'][selection_export_quantities].drop(columns=['Item Code', "Element Code", "Element", "Year Code", "Flag"])
In [24]:
display(df_useful['Live animals import quantities'].sample(5))
Area Code Area Item Year Unit Value
97300 351 China Buffaloes 2007 Head 0.0
151414 58 Ecuador Mules 2009 Head NaN
214576 97 Hungary Rabbits and hares 1979 1000 Head NaN
185544 79 Germany Mules 1991 Head 32.0
189251 81 Ghana Sheep 1988 Head 5000.0
In [25]:
select_pigs = df_useful['Live animals import quantities']['Item']=='Pigs'
pigs_df = df_useful['Live animals import quantities'][select_pigs]

select_switzerland = pigs_df['Area']=='Switzerland'
select_france = pigs_df['Area']=='France'
select_austria = pigs_df['Area']=='Austria'
select_canada = pigs_df['Area']=='Canada'
ax = pigs_df[select_switzerland].plot(x ='Year', y='Value', kind = 'line')
ax = pigs_df[select_france].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = pigs_df[select_austria].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = pigs_df[select_canada].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax.legend(["Switzerland", 'France', 'Austria', "Canada"])
Out[25]:
<matplotlib.legend.Legend at 0x1bd2c622788>
In [26]:
select_USSR = pigs_df['Area']=='USSR'
select_russia = pigs_df['Area']=='Russian Federation'
select_ukraine = pigs_df['Area']=='Ukraine'
ax = pigs_df[select_USSR].plot(x ='Year', y='Value', kind = 'line')
ax = pigs_df[select_russia].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = pigs_df[select_ukraine].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax.legend(["USSR", 'Russia', 'Ukraine'])
Out[26]:
<matplotlib.legend.Legend at 0x1bd48cfd808>
In [27]:
display(df_useful['Live animals export quantities'].sample(5))
Area Code Area Item Year Unit Value
260301 120 Lao People's Democratic Republic Buffaloes 2010 Head 0.0
542116 5102 Middle Africa Bovine, Animals 1980 Head 262790.0
389361 188 Saint Kitts and Nevis Cattle 2003 Head NaN
529062 5000 World Pigs 1964 Head 2890291.0
602064 5400 Europe Ducks 1964 1000 Head 0.0
In [28]:
select_pigs = df_useful['Live animals export quantities']['Item']=='Pigs'
pigs_df = df_useful['Live animals export quantities'][select_pigs]

select_switzerland = pigs_df['Area']=='Switzerland'
select_france = pigs_df['Area']=='France'
select_austria = pigs_df['Area']=='Austria'
select_canada = pigs_df['Area']=='Canada'
ax = pigs_df[select_switzerland].plot(x ='Year', y='Value', kind = 'line')
ax = pigs_df[select_france].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = pigs_df[select_austria].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = pigs_df[select_canada].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax.legend(["Switzerland", 'France', 'Austria', "Canada"])
Out[28]:
<matplotlib.legend.Legend at 0x1bd2b6a5a08>
In [29]:
select_USSR = pigs_df['Area']=='USSR'
select_russia = pigs_df['Area']=='Russian Federation'
select_ukraine = pigs_df['Area']=='Ukraine'
ax = pigs_df[select_USSR].plot(x ='Year', y='Value', kind = 'line')
ax = pigs_df[select_russia].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = pigs_df[select_ukraine].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax.legend(["USSR", 'Russia', 'Ukraine'])
Out[29]:
<matplotlib.legend.Legend at 0x1bd2c333488>
In [30]:
selection_import_quantities = df['Crops trade']["Element"] == 'Import Quantity'
selection_export_quantities = df['Crops trade']["Element"] == 'Export Quantity'

df_useful['Crops import quantities'] = df['Crops trade'][selection_import_quantities].drop(columns=['Item Code', "Element Code", "Element", "Year Code", "Flag"])
df_useful['Crops export quantities'] = df['Crops trade'][selection_export_quantities].drop(columns=['Item Code', "Element Code", "Element", "Year Code", "Flag"])
In [31]:
display(df_useful['Crops import quantities'].sample(5))
Area Code Area Item Year Unit Value
429776 10 Australia Coconuts 1989 tonnes 1898.0
987490 15 Belgium-Luxembourg Meat Prepared Pres 1984 tonnes 18756.0
1755302 35 Cabo Verde Rice 1992 tonnes 16867.0
2104921 40 Chile Poppy seed 1970 tonnes NaN
8624747 185 Russian Federation Tobacco 2010 tonnes 258173.0
In [32]:
select_Maize = df_useful['Crops import quantities']['Item']=='Maize'
maize_df = df_useful['Crops import quantities'][select_Maize]

select_switzerland = maize_df['Area']=='Switzerland'
select_france = maize_df['Area']=='France'
select_austria = maize_df['Area']=='Austria'
select_canada = maize_df['Area']=='Canada'
ax = maize_df[select_switzerland].plot(x ='Year', y='Value', kind = 'line')
ax = maize_df[select_france].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = maize_df[select_austria].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = maize_df[select_canada].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax.legend(["Switzerland", 'France', 'Austria', "Canada"])
Out[32]:
<matplotlib.legend.Legend at 0x1bd2c560388>
In [33]:
select_USSR = maize_df['Area']=='USSR'
select_russia = maize_df['Area']=='Russian Federation'
select_ukraine = maize_df['Area']=='Ukraine'
ax = maize_df[select_USSR].plot(x ='Year', y='Value', kind = 'line')
ax = maize_df[select_russia].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = maize_df[select_ukraine].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax.legend(["USSR", 'Russia', 'Ukraine'])
Out[33]:
<matplotlib.legend.Legend at 0x1bd2c01e308>
In [34]:
display(df_useful['Crops export quantities'].sample(5))
Area Code Area Item Year Unit Value
4496294 89 Guatemala Vermouths & similar 2010 tonnes 0.0
3902776 68 France Offals, edible, cattle 1975 tonnes 3581.0
1834826 32 Cameroon Chillies and peppers, dry 1963 tonnes 1.0
7375166 157 Nicaragua Meal, meat 1982 tonnes 41.0
5491871 110 Japan Feed and meal, gluten 1982 tonnes 6138.0
In [35]:
select_Maize = df_useful['Crops export quantities']['Item']=='Maize'
maize_df = df_useful['Crops export quantities'][select_Maize]

select_switzerland = maize_df['Area']=='Switzerland'
select_france = maize_df['Area']=='France'
select_austria = maize_df['Area']=='Austria'
select_canada = maize_df['Area']=='Canada'
ax = maize_df[select_switzerland].plot(x ='Year', y='Value', kind = 'line')
ax = maize_df[select_france].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = maize_df[select_austria].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = maize_df[select_canada].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax.legend(["Switzerland", 'France', 'Austria', "Canada"])
Out[35]:
<matplotlib.legend.Legend at 0x1bd2be11848>
In [36]:
select_USSR = maize_df['Area']=='USSR'
select_russia = maize_df['Area']=='Russian Federation'
select_ukraine = maize_df['Area']=='Ukraine'
ax = maize_df[select_USSR].plot(x ='Year', y='Value', kind = 'line')
ax = maize_df[select_russia].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = maize_df[select_ukraine].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax.legend(["USSR", 'Russia', 'Ukraine'])
Out[36]:
<matplotlib.legend.Legend at 0x1bd2c470dc8>
1.D.a.v. Extracting average CPI of each year from the "Consumer price indices" dataset

The "Consumer price indices" dataset contains monthly data. In order to have a uniform dataframe, and as other dataframes have yearly data, we group it by (Country, Year) and compute the mean over the monthes.

In [37]:
df_useful['Consumer price indices'] =  df['Consumer price indices'][['Area',"Year",'Value']] \
                                        .dropna() \
                                        .groupby(['Area',"Year"]) \
                                        .mean() \
                                        .reset_index() \
                                        .dropna()

With samples and plots, we remark that this dataset only start in 2000 wheareas other ones start in 1970.

In [38]:
display(df_useful['Consumer price indices'].sample(5))
Area Year Value
2979 Tajikistan 2001 39.298600
2677 Serbia 2001 38.528525
1839 Malaysia 2007 92.704678
608 Chad 2008 92.878315
2254 Oman 2016 106.647419
In [39]:
select_switzerland = df_useful['Consumer price indices']['Area']=='Switzerland'
select_france = df_useful['Consumer price indices']['Area']=='France'
select_austria = df_useful['Consumer price indices']['Area']=='Austria'
select_canada = df_useful['Consumer price indices']['Area']=='Canada'
ax = df_useful['Consumer price indices'][select_switzerland].plot(x ='Year', y='Value', kind = 'line')
ax = df_useful['Consumer price indices'][select_france].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = df_useful['Consumer price indices'][select_austria].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = df_useful['Consumer price indices'][select_canada].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax.legend(["Switzerland", 'France', 'Austria', "Canada"])
Out[39]:
<matplotlib.legend.Legend at 0x1bd2c1aaec8>
In [40]:
select_russia = df_useful["Consumer price indices"]['Area']=='Russian Federation'
select_ukraine = df_useful["Consumer price indices"]['Area']=='Ukraine'
ax = df_useful["Consumer price indices"][select_russia].plot(x ='Year', y='Value', kind = 'line')
ax = df_useful["Consumer price indices"][select_ukraine].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax.legend(['Russia', 'Ukraine'])
Out[40]:
<matplotlib.legend.Legend at 0x1bd2c023748>
1.D.a.vi. Removing areas which are not countries

Having a more detailled look at the dataset, we have remarked that the areas which are real countries are exactely the ones with an "Area Code" below $5000$ but not in $[261, 269]$.

In [41]:
#remove Area code >= 5000 or in [261, 269] (EU)
for df_name in df_useful :
    if 'Area Code' in df_useful[df_name].keys() : 
        print ("Removing areas which are not countries in", df_name)
        selection_countries = df_useful[df_name]['Area Code']<261 
        selection_countries = selection_countries | df_useful[df_name]['Area Code']>269
        selection_countries = selection_countries & df_useful[df_name]['Area Code']<5000
        df_useful[df_name] = df_useful[df_name][selection_countries]
        display(df_useful[df_name].sample(5))
    else :
        print (df_name, "is already clean")
Removing areas which are not countries in GDP
Area Code Area Year Value
511342 155 Vanuatu 2002 264.219504
268496 124 Libya 1973 8003.296604
376277 174 Portugal 2009 243731.066790
400770 191 Saint Vincent and the Grenadines 1988 193.975441
189370 84 Greece 2010 299361.576560
Removing areas which are not countries in Crops Area harvested
Area Code Area Item Year Unit Value
833764 102 Iran (Islamic Republic of) Rapeseed 2005 ha 161000.0
315756 40 Chile Peaches and nectarines 1986 ha 15160.0
53593 9 Argentina Chillies and peppers, dry 1967 ha 1700.0
1060633 133 Mali Karite nuts (sheanuts) 1973 ha 32000.0
191573 80 Bosnia and Herzegovina Garlic 2000 ha 1889.0
Removing areas which are not countries in Crops Production
Area Code Area Item Year Unit Value
212947 21 Brazil Citrus Fruit,Total 2004 tonnes 20530052.0
1352864 173 Poland Plums and sloes 1974 tonnes 138000.0
2293211 5400 Europe Peas, dry 1979 tonnes 3807830.0
2472966 5801 Least Developed Countries Cauliflowers and broccoli 2006 tonnes 138924.0
853811 104 Ireland Carrots and turnips 2003 tonnes 23000.0
Removing areas which are not countries in Crops Seed
Area Code Area Item Year Unit Value
2317129 5401 Eastern Europe Rice, paddy 1968 tonnes 90969.0
1026565 129 Madagascar Peas, dry 1993 tonnes 330.0
359672 41 China, mainland Millet 1996 tonnes 50464.0
1049768 131 Malaysia Sweet potatoes 2010 tonnes 2006.0
172256 18 Bhutan Soybeans 1970 tonnes 0.0
Removing areas which are not countries in Crops Yield
Area Code Area Item Year Unit Value
2583672 5817 Net Food Importing Developing Countries Onions, dry 2001 hg/ha 134112.0
56177 9 Argentina Maize 1967 hg/ha 24663.0
977030 119 Latvia Rapeseed 2005 hg/ha 20406.0
2147758 5207 South America Yams 1969 hg/ha 82201.0
1197927 153 New Caledonia Wheat 1994 hg/ha 20571.0
Removing areas which are not countries in Livestock production
Area Code Area Item Year Unit Value
57162 102 Iran (Islamic Republic of) Camels 2007 Head 152000.0
84700 159 Nigeria Cattle 1977 Head 11223000.0
20295 33 Canada Chickens 1961 1000 Head 69962.0
26885 214 China, Taiwan Province of Poultry Birds 2014 1000 Head 101652.0
39415 63 Estonia Rabbits and hares 2004 1000 Head 43.0
Removing areas which are not countries in Live animals import quantities
Area Code Area Item Year Unit Value
332977 158 Niger Goats 1984 Head 14400.0
369240 174 Portugal Beehives 1976 No NaN
561610 5203 Northern America Horses 1970 Head 18015.0
622986 5500 Oceania Buffaloes 1970 Head NaN
547324 5104 Southern Africa Asses 1994 Head 40.0
Removing areas which are not countries in Live animals export quantities
Area Code Area Item Year Unit Value
661168 5817 Net Food Importing Developing Countries Mules 1973 Head 0.0
241813 109 Jamaica Pigs 1974 Head 0.0
274494 256 Luxembourg Goats 2013 Head 0.0
315336 149 Nepal Ducks 1976 1000 Head NaN
54024 19 Bolivia (Plurinational State of) Horses 1984 Head 0.0
Removing areas which are not countries in Crops import quantities
Area Code Area Item Year Unit Value
256046 8 Antigua and Barbuda Onions 1964 tonnes 237.0
7766284 165 Pakistan Coconuts 1971 tonnes 24.0
9888073 210 Sweden Wool, degreased 1962 tonnes 4100.0
3040094 167 Czechia Spinach 2011 tonnes 442.0
2464859 214 China, Taiwan Province of Juice, fruit nes 1985 tonnes 3452.0
Removing areas which are not countries in Crops export quantities
Area Code Area Item Year Unit Value
7879043 166 Panama Vermouths & similar 1999 tonnes 0.0
14225097 5802 Land Locked Developing Countries Cocoa, paste 1993 tonnes 0.0
3541203 60 El Salvador Plums and sloes 1976 tonnes NaN
29844 3 Albania Cherries, sour 1966 tonnes NaN
10119713 216 Thailand Rubber natural dry 2008 tonnes 1995524.0
Consumer price indices is already clean

1.D.b. Handling of the missing data

In this section, we will explain how we will handle the missing data in previous dataframes for maps.

1.D.b.i. Highlighting the problem
In [42]:
select_USSR = df_useful["GDP"]['Area']=='USSR'
select_russia = df_useful["GDP"]['Area']=='Russian Federation'
select_ukraine = df_useful["GDP"]['Area']=='Ukraine'
ax = df_useful["GDP"][select_USSR].plot(x ='Year', y='Value', kind = 'line')
ax = df_useful["GDP"][select_russia].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = df_useful["GDP"][select_ukraine].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax.legend(["USSR", 'Russia', 'Ukraine'])
Out[42]:
<matplotlib.legend.Legend at 0x1bd2b629bc8>

In order to vizualize folium maps, we need to associate each country a value. The geojson file that we use is not timestamped and only countries that exist nowadays are inside it. As some countries has been dissolved during the past 50 years, our folium maps won't be complete. For instance, we do not have any value for Ukraine from 1970 to 1989. Our idea to fix this issue is presented in the next paragraph.

1.D.b.ii. Proposed correction

Our idea is to map the former country value to each of the current ones. For instance in 1982, USSR GDP is around one trillion $. Therefore, if we associate (only for folium map purposes) this value to each current country that succeeded USSR, all these countries will appear the same color in the folium map, i.e. all the USSR area will appear the same color (and the good one).

In order to do so, one need to identify which countries appeared and disappeared from the dataset and at which year. Then we will use this result along with some historical research in our visualise_world_data_folium function (1Ea).

In [43]:
countries_formation_years = {}
for country in df_useful["GDP"]["Area"].unique():
    selection = df_useful["GDP"]["Area"] == country
    year_in, year_out = df_useful["GDP"][selection].dropna()["Year"].min(), df_useful["GDP"][selection].dropna()["Year"].max()
    for year in (year_in, year_out):
        if year not in countries_formation_years :
            countries_formation_years[year] = []
    countries_formation_years[year_in].append((country,'+'))
    countries_formation_years[year_out].append((country,'-'))

countries_formation_years.pop(1970)
countries_formation_years.pop(2015)
for year in sorted(list(countries_formation_years)):
    print (year, countries_formation_years[year])
1988 [('Yemen Ar Rp', '-'), ('Yemen Dem', '-')]
1989 [('Czechoslovakia', '-'), ('Ethiopia PDR', '-'), ('USSR', '-'), ('Yemen', '+'), ('Yugoslav SFR', '-')]
1990 [('Armenia', '+'), ('Azerbaijan', '+'), ('Belarus', '+'), ('Bosnia and Herzegovina', '+'), ('Croatia', '+'), ('Czechia', '+'), ('Eritrea', '+'), ('Estonia', '+'), ('Ethiopia', '+'), ('Georgia', '+'), ('Kazakhstan', '+'), ('Kyrgyzstan', '+'), ('Latvia', '+'), ('Lithuania', '+'), ('Montenegro', '+'), ('Republic of Moldova', '+'), ('Russian Federation', '+'), ('Serbia', '+'), ('Slovakia', '+'), ('Slovenia', '+'), ('Tajikistan', '+'), ('The former Yugoslav Republic of Macedonia', '+'), ('Timor-Leste', '+'), ('Turkmenistan', '+'), ('Ukraine', '+'), ('Uzbekistan', '+'), ('Central Asia', '+')]
1999 [('Kosovo', '+')]
2005 [('Curaçao', '+'), ('Sint Maarten (Dutch Part)', '+')]
2007 [('Sudan (former)', '-')]
2008 [('South Sudan', '+'), ('Sudan', '+')]
2012 [('Netherlands Antilles (former)', '-')]

1.E. Preprocessing the data set

In this part, we will finish prepocessing the datasets. More precisely, we will deal with country names and normalizing the features.

1.E.a. Converting country names between different naming conventions

Some countries have different names in the geojson file and in the dataset. We first start by correcting them.

In [44]:
dic = {'Czechia': "Czech Republic",
       'Russian Federation':'Russia',
       "Serbia":"Republic of Serbia",
       'The former Yugoslav Republic of Macedonia':'Macedonia',
       'China, mainland':'China',
       'Viet Nam':'Vietnam',
       'Venezuela (Bolivarian Republic of)':'Venezuela',
       'Iran (Islamic Republic of)':'Iran',
       'Syrian Arab Republic':"Syria",
       'Bolivia (Plurinational State of)': 'Bolivia',
       "Côte d'Ivoire": "Ivory Coast",
       'Congo':"Republic of the Congo",
       "Lao People's Democratic Republic":'Laos',
       "Democratic People's Republic of Korea":"North Korea",
       'Republic of Korea':"South Korea"}

def correct_country_names(old_name):
    if old_name in dic.keys() :
        return dic[old_name]
    return old_name
In [45]:
for df_name in df_useful :
    print (df_name)
    df_useful[df_name]["Area"] = df_useful[df_name]["Area"].apply(correct_country_names)
GDP
Crops Area harvested
Crops Production
Crops Seed
Crops Yield
Livestock production
Live animals import quantities
Live animals export quantities
Crops import quantities
Crops export quantities
Consumer price indices

Then, we do a function that takes as input a dataframe and a year and produces the corresponding folium map. This function also handles dissolutions of countries as suggested before.

In [46]:
def visualise_world_data_folium(df, year, logScale=True):
    dic = {'USSR':                            ['Armenia', 'Azerbaijan','Belarus', 'Estonia', 'Georgia',
                                               'Kazakhstan', 'Kyrgyzstan', 'Latvia', 'Lithuania',
                                               'Montenegro', 'Republic of Moldova', 'Russia',
                                               'Republic of Serbia', 'Timor-Leste', 'Turkmenistan', 'Ukraine',
                                               'Uzbekistan'],
           'Ethiopia PDR':                     ['Eritrea','Ethiopia'],
           'Yugoslav SFR':                     ['Kosovo', 'Slovenia', 'Croatia',
                                                'Macedonia', 'Bosnia and Herzegovina'],
           'Yemen Dem' :                       ['Yemen'],        
           'Czechoslovakia':                   ["Czech Republic", 'Slovakia'],
           'Netherlands Antilles (former)':    ['Curaçao', 'Sint Maarten (Dutch Part)'],
           'Sudan (former)':                   ['South Sudan', 'Sudan']
          }
    def add_new_names(old_name):
        if old_name in dic.keys() :
            return dic[old_name]
        return old_name
    to_plot=df[df["Year"]==year]
    to_plot=(to_plot[['Area','Value']]
             .dropna()
             .groupby('Area')             
             .mean()
             .reset_index()
             .dropna())    
    to_plot['Area']=to_plot['Area'].apply(add_new_names)
    to_plot = to_plot.explode('Area')
    if logScale :
        to_plot.Value=np.log10(to_plot.Value)
    
    m = folium.Map(location=[40,-10],zoom_start=1.6)
    folium.Choropleth(
        geo_data=f"https://raw.githubusercontent.com/python-visualization/folium/master/examples/data/world-countries.json",
        data=to_plot,
        columns=['Area', 'Value'],
        key_on='feature.properties.name',
        fill_color='YlGn',fill_opacity=0.7,line_opacity=0.2,nan_fill_opacity=0.0
    ).add_to(m)

    folium.LayerControl().add_to(m)

    return(m)

We can know use it to produce some maps. For instance, we plot below the map of GDP for the year 1985 (befor edissolution of USSR).

In [47]:
display(visualise_world_data_folium(df_useful["GDP"], 1985, True))

1.E.b. Normalization and log scales

Some of our features seem to be right skewed. At first glance it seems that they look like power laws.

For instance the distribution of GDP look a bit like a power law:

In [48]:
sns.distplot(df_useful["GDP"]["Value"], rug=False, hist=False)
Out[48]:
<matplotlib.axes._subplots.AxesSubplot at 0x1bd2bef8248>

As we later want to train some Machine Learning models, we log those values so that their distribution look a bit more like a normal distribution.

In [49]:
#looks better with log scale
sns.distplot(np.log(df_useful["GDP"]["Value"]), rug=False, hist=False)
Out[49]:
<matplotlib.axes._subplots.AxesSubplot at 0x1bd2bfb8d88>

The new distribution indeed looks better to train models on it.

1.F. Making one uniformized dataframe

In this part, we will make one uniformized dataframe uni_df with the following columns.

Country | Year | GDP | Crops production columns | Livestock production columns | Crops importation columns | Livestock importation columns | Crops exportation columns | Livestock exportation columns | CPI

In this uniformized dataframe, a tuple (Country, Year) uniquely identifies a row.

1.F.a. Pivoting dataframes with items

The current dataframes have several rows for a given (Country, Year). Each of this row correspond to one item. We would like to have a unique row for a given (Country, Year) and one column per item:

In [50]:
need_pivot = ['Crops Area harvested',
              'Crops Production',
              'Crops Seed',
              'Crops Yield',
              'Livestock production',
              'Live animals import quantities',
              'Live animals export quantities',
              'Crops import quantities',
              'Crops export quantities']

def rename_columns(x, word):
    if x not in ['Area', 'Year', 'ha', 'tonnes', 'hg/ha', 'Head', '1000 Head']:
        return x + ' ' + word
    return x

df_useful['GDP'] = df_useful['GDP'].rename(columns = {'Value':'(GDP, million $)'})[["Area",'Year','(GDP, million $)']]
df_useful['Consumer price indices'] = df_useful['Consumer price indices'].rename(columns = {'Value':'(Consumer price indices, %)'})[["Area",'Year','(Consumer price indices, %)']]

for df_name in need_pivot :
    df_useful[df_name] = pd.pivot_table(df_useful[df_name], index=["Area",'Year'], columns=["Item","Unit"], values="Value").rename(columns=lambda x: rename_columns(x, df_name))
    display(df_useful[df_name].sample(5))
Item Anise, badian, fennel, coriander Crops Area harvested Apples Crops Area harvested Apricots Crops Area harvested Areca nuts Crops Area harvested Artichokes Crops Area harvested Asparagus Crops Area harvested Avocados Crops Area harvested Bambara beans Crops Area harvested Bananas Crops Area harvested Barley Crops Area harvested ... Sweet potatoes Crops Area harvested Tangerines, mandarins, clementines, satsumas Crops Area harvested Taro (cocoyam) Crops Area harvested Tomatoes Crops Area harvested Tung nuts Crops Area harvested Vegetables&Melons, Total Crops Area harvested Vetches Crops Area harvested Watermelons Crops Area harvested Wheat Crops Area harvested Yams Crops Area harvested
Unit ha ha ha ha ha ha ha ha ha ha ... ha ha ha ha ha ha ha ha ha ha
Area Year
Saint Vincent and the Grenadines 1986 NaN 150.0 NaN NaN NaN NaN NaN NaN 2929.0 NaN ... 5600.0 NaN NaN NaN NaN 380.0 NaN NaN NaN 200.0
New Zealand 1994 NaN 15257.0 844.0 NaN NaN 3200.0 1375.0 NaN NaN 74500.0 ... 1048.0 914.0 NaN 2394.0 NaN 60618.0 NaN 250.0 42000.0 NaN
Haiti 1972 NaN NaN NaN NaN NaN NaN NaN NaN 28000.0 NaN ... 48000.0 NaN NaN 194.0 NaN 41085.0 NaN NaN NaN 24500.0
European Union 1995 24599.0 696242.0 85802.0 NaN 85937.0 59568.0 18490.0 NaN 10241.0 15262773.0 ... 4576.0 125005.0 115.0 380775.0 NaN 3015899.0 202357.0 103190.0 25739084.0 129.0
Israel 1976 NaN 2740.0 670.0 NaN 450.0 NaN 2170.0 NaN 1580.0 26560.0 ... 27.0 2550.0 NaN 6040.0 NaN 29710.0 2170.0 6410.0 108220.0 NaN

5 rows × 120 columns

Item Anise, badian, fennel, coriander Crops Production Apples Crops Production Apricots Crops Production Areca nuts Crops Production Artichokes Crops Production Asparagus Crops Production Avocados Crops Production Bambara beans Crops Production Bananas Crops Production Barley Crops Production ... Sweet potatoes Crops Production Tangerines, mandarins, clementines, satsumas Crops Production Taro (cocoyam) Crops Production Tomatoes Crops Production Tung nuts Crops Production Vegetables&Melons, Total Crops Production Vetches Crops Production Watermelons Crops Production Wheat Crops Production Yams Crops Production
Unit tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes ... tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes
Area Year
Latvia 2014 NaN 9623.0 NaN NaN NaN NaN NaN NaN NaN 418800.0 ... NaN NaN NaN 4913.0 NaN 191720.0 39.0 NaN 1467500.0 NaN
Cyprus 1970 10.0 7518.0 2032.0 NaN 5080.0 NaN 55.0 NaN 2540.0 55880.0 ... NaN 3518.0 5080.0 23368.0 NaN 117159.0 6604.0 19304.0 43180.0 NaN
Melanesia 2002 NaN NaN NaN NaN NaN NaN NaN NaN 878498.0 NaN ... 582067.0 NaN 325339.0 3076.0 NaN 507616.0 NaN 3356.0 10.0 320934.0
Austria 2006 NaN 509139.0 25203.0 NaN NaN 2161.0 NaN NaN NaN 914052.0 ... NaN NaN NaN 39105.0 NaN 530293.0 1917.0 374.0 1396300.0 NaN
Equatorial Guinea 2001 NaN NaN NaN NaN NaN NaN NaN NaN 22570.0 NaN ... 42606.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN

5 rows × 122 columns

Item Anise, badian, fennel, coriander Crops Seed Bambara beans Crops Seed Bananas Crops Seed Barley Crops Seed Beans, dry Crops Seed Broad beans, horse beans, dry Crops Seed Buckwheat Crops Seed Cabbages and other brassicas Crops Seed Carrots and turnips Crops Seed Cassava Crops Seed ... Sorghum Crops Seed Soybeans Crops Seed Sugar cane Crops Seed Sweet potatoes Crops Seed Taro (cocoyam) Crops Seed Vegetables&Melons, Total Crops Seed Vetches Crops Seed Watermelons Crops Seed Wheat Crops Seed Yams Crops Seed
Unit tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes ... tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes
Area Year
Paraguay 2013 NaN NaN NaN NaN 4200.0 1437.0 NaN NaN NaN NaN ... 250.0 157500.0 NaN NaN NaN NaN NaN NaN 40000.0 NaN
Greece 1966 NaN NaN NaN 49174.0 7128.0 3182.0 NaN NaN NaN NaN ... 162.0 NaN NaN NaN NaN NaN 3394.0 NaN 180000.0 NaN
Morocco 1994 NaN NaN NaN 126280.0 605.0 5900.0 NaN NaN NaN NaN ... 584.0 NaN NaN NaN NaN NaN 692.0 NaN 196790.0 NaN
Low Income Food Deficit Countries 2014 NaN 18029.0 NaN 354914.0 633806.0 41161.0 0.0 NaN NaN NaN ... 685763.0 723117.0 24794542.0 189314.0 27062.0 425.0 14206.0 NaN 5861629.0 2247225.0
Ecuador 1975 NaN NaN NaN 7160.0 2581.0 719.0 NaN NaN NaN NaN ... 11.0 581.0 NaN NaN NaN 7378.0 NaN NaN 9941.0 NaN

5 rows × 52 columns

Item Anise, badian, fennel, coriander Crops Yield Apples Crops Yield Apricots Crops Yield Areca nuts Crops Yield Artichokes Crops Yield Asparagus Crops Yield Avocados Crops Yield Bambara beans Crops Yield Bananas Crops Yield Barley Crops Yield ... Sweet potatoes Crops Yield Tangerines, mandarins, clementines, satsumas Crops Yield Taro (cocoyam) Crops Yield Tomatoes Crops Yield Tung nuts Crops Yield Vegetables&Melons, Total Crops Yield Vetches Crops Yield Watermelons Crops Yield Wheat Crops Yield Yams Crops Yield
Unit hg/ha hg/ha hg/ha hg/ha hg/ha hg/ha hg/ha hg/ha hg/ha hg/ha ... hg/ha hg/ha hg/ha hg/ha hg/ha hg/ha hg/ha hg/ha hg/ha hg/ha
Area Year
Niger 1977 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 65556.0 NaN NaN 80000.0 NaN 142081.0 NaN NaN 15833.0 NaN
Antigua and Barbuda 1995 NaN NaN NaN NaN NaN NaN NaN NaN 115610.0 NaN ... 58333.0 NaN 35714.0 83333.0 NaN 71059.0 NaN NaN NaN 38462.0
Canada 1986 NaN 155270.0 49787.0 NaN NaN 15148.0 NaN NaN NaN 30170.0 ... NaN NaN NaN 387866.0 NaN 180668.0 NaN NaN 22052.0 NaN
Guadeloupe 1964 NaN NaN NaN NaN NaN NaN NaN NaN 205323.0 NaN ... 130000.0 NaN NaN 187500.0 NaN 83700.0 NaN 250000.0 NaN 127778.0
Southern Africa 1997 NaN 254822.0 165397.0 NaN NaN NaN 82932.0 NaN 121729.0 13378.0 ... 36776.0 188527.0 NaN 324382.0 NaN 164427.0 NaN 130130.0 17581.0 NaN

5 rows × 120 columns

Item Animals live nes Livestock production Asses Livestock production Beehives Livestock production Buffaloes Livestock production Camelids, other Livestock production Camels Livestock production Cattle Livestock production Cattle and Buffaloes Livestock production Chickens Livestock production Ducks Livestock production ... Horses Livestock production Mules Livestock production Pigeons, other birds Livestock production Pigs Livestock production Poultry Birds Livestock production Rabbits and hares Livestock production Rodents, other Livestock production Sheep Livestock production Sheep and Goats Livestock production Turkeys Livestock production
Unit Head Head No Livestock production Head Head Head Head Head 1000 Head 1000 Head ... Head Head 1000 Head Head 1000 Head 1000 Head 1000 Head Head Head 1000 Head
Area Year
Tokelau 1976 NaN NaN NaN NaN NaN NaN NaN NaN 3.0 NaN ... NaN NaN NaN 900.0 3.0 NaN NaN NaN NaN NaN
Ethiopia PDR 1988 NaN 4800000.0 3750000.0 NaN NaN 1030000.0 27000000.0 27000000.0 57000.0 NaN ... 2550000.0 550000.0 NaN 18000.0 57000.0 NaN NaN 24000000.0 42000000.0 NaN
Least Developed Countries 2009 NaN 13757803.0 11380779.0 10881477.0 NaN 18636646.0 260250659.0 271132136.0 1006123.0 73362.0 ... 5888871.0 502440.0 700.0 32443897.0 1099383.0 2739.0 NaN 179466651.0 455100492.0 2462.0
Italy 1999 NaN 23000.0 900000.0 200000.0 NaN NaN 7129000.0 7329000.0 106000.0 NaN ... 288000.0 10000.0 NaN 8323000.0 131000.0 67000.0 NaN 10894000.0 12225000.0 25000.0
Czech Republic 2009 NaN NaN 497946.0 NaN NaN NaN 1349286.0 1349286.0 24042.0 402.0 ... 29887.0 NaN NaN 1971000.0 24839.0 5500.0 NaN 196913.0 218622.0 376.0

5 rows × 22 columns

Item Animals live nes Live animals import quantities Asses Live animals import quantities Beehives Live animals import quantities Bovine, Animals Live animals import quantities Buffaloes Live animals import quantities Camelids, other Live animals import quantities Camels Live animals import quantities Cattle Live animals import quantities Chickens Live animals import quantities ... Mules Live animals import quantities Pigeons, other birds Live animals import quantities Pigs Live animals import quantities Rabbits and hares Live animals import quantities Rodents, other Live animals import quantities Sheep Live animals import quantities Sheep and Goats Live animals import quantities Turkeys Live animals import quantities
Unit Head Head No Live animals import quantities Head Head Head Head Head 1000 Head Head ... Head 1000 Head Head Head 1000 Head 1000 Head Head Head 1000 Head Head
Area Year
Democratic Republic of the Congo 2009 0.0 NaN NaN 4269.0 NaN NaN NaN 4269.0 600.0 NaN ... NaN NaN NaN 0.0 NaN NaN 0.0 0.0 NaN NaN
British Virgin Islands 1980 NaN NaN NaN 0.0 NaN NaN NaN 0.0 NaN NaN ... NaN NaN NaN NaN NaN NaN 0.0 0.0 NaN NaN
Tonga 1962 NaN NaN NaN 0.0 NaN NaN NaN 0.0 0.0 NaN ... NaN NaN NaN 0.0 NaN NaN NaN NaN NaN NaN
Poland 1961 NaN NaN NaN 700.0 NaN NaN NaN 700.0 0.0 NaN ... NaN NaN NaN 100.0 0.0 NaN 300.0 300.0 NaN NaN
Tonga 1999 NaN NaN NaN 0.0 NaN NaN NaN 0.0 0.0 NaN ... NaN NaN NaN 0.0 NaN NaN NaN NaN NaN NaN

5 rows × 24 columns

Item Animals live nes Live animals export quantities Asses Live animals export quantities Beehives Live animals export quantities Bovine, Animals Live animals export quantities Buffaloes Live animals export quantities Camelids, other Live animals export quantities Camels Live animals export quantities Cattle Live animals export quantities Chickens Live animals export quantities ... Mules Live animals export quantities Pigeons, other birds Live animals export quantities Pigs Live animals export quantities Rabbits and hares Live animals export quantities Rodents, other Live animals export quantities Sheep Live animals export quantities Sheep and Goats Live animals export quantities Turkeys Live animals export quantities
Unit Head Head No Live animals export quantities Head Head Head Head Head 1000 Head Head ... Head 1000 Head Head Head 1000 Head 1000 Head Head Head 1000 Head Head
Area Year
Sudan (former) 1976 0.0 NaN NaN 1498.0 NaN NaN 5000.0 1498.0 NaN NaN ... NaN NaN NaN NaN NaN NaN 126000.0 149000.0 NaN NaN
Cameroon 1997 NaN NaN NaN 2968.0 NaN NaN NaN 2968.0 6.0 NaN ... NaN NaN NaN NaN NaN NaN 1300.0 1300.0 NaN NaN
Vanuatu 1981 NaN NaN NaN 0.0 NaN NaN NaN 0.0 NaN NaN ... NaN NaN NaN 0.0 NaN NaN NaN NaN NaN NaN
EU(12)ex.int 1997 NaN 96.0 NaN 279684.0 NaN NaN NaN 279684.0 89802.0 89802.0 ... 1.0 50.0 50.0 171006.0 59.0 NaN 33708.0 36092.0 12842.0 12842.0
Spain 2006 NaN 0.0 NaN 117918.0 NaN NaN NaN 117918.0 11006.0 NaN ... 1.0 0.0 NaN 1307096.0 365.0 NaN 498459.0 511423.0 280.0 NaN

5 rows × 24 columns

Item Alfalfa meal and pellets Crops import quantities Almonds shelled Crops import quantities Animal Oil+Fat+Grs Crops import quantities Animal Vegetable Oil Crops import quantities Animal fats Crops import quantities Anise, badian, fennel, coriander Crops import quantities Apples Crops import quantities Apricots Crops import quantities Apricots, dry Crops import quantities Artichokes Crops import quantities ... Wheat+Flour,Wheat Equivalent Crops import quantities Whey, Pres+Concen Crops import quantities Whey, condensed Crops import quantities Whey, dry Crops import quantities Wine Crops import quantities Wine+Vermouth+Sim. Crops import quantities Wool, degreased Crops import quantities Wool, greasy Crops import quantities Wool, hair waste Crops import quantities Yoghurt, concentrated or not Crops import quantities
Unit tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes ... tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes
Area Year
Armenia 2004 NaN 74.0 4851.0 26281.0 4851.0 53.0 0.0 NaN 29.0 NaN ... 404255.0 25.0 NaN 25.0 79.0 107.0 0.0 10.0 NaN 6.0
Nigeria 1976 NaN NaN 33701.0 62750.0 33701.0 NaN NaN NaN NaN NaN ... 735497.0 NaN NaN NaN 5987.0 5987.0 NaN NaN 0.0 NaN
Singapore 1992 NaN NaN 4849.0 959192.0 4849.0 18297.0 42398.0 111.0 NaN NaN ... 398825.0 1786.0 NaN 1786.0 3954.0 4236.0 0.0 2.0 0.0 1473.0
Rwanda 1975 NaN NaN 0.0 5625.0 0.0 NaN NaN NaN NaN NaN ... 10198.0 NaN NaN NaN 117.0 117.0 NaN NaN NaN NaN
1998 NaN NaN 0.0 8501.0 0.0 NaN 0.0 NaN NaN NaN ... 18968.0 NaN NaN NaN 144.0 173.0 0.0 NaN NaN 3.0

5 rows × 454 columns

Item Alfalfa meal and pellets Crops export quantities Almonds shelled Crops export quantities Animal Oil+Fat+Grs Crops export quantities Animal Vegetable Oil Crops export quantities Animal fats Crops export quantities Anise, badian, fennel, coriander Crops export quantities Apples Crops export quantities Apricots Crops export quantities Apricots, dry Crops export quantities Artichokes Crops export quantities ... Wheat+Flour,Wheat Equivalent Crops export quantities Whey, Pres+Concen Crops export quantities Whey, condensed Crops export quantities Whey, dry Crops export quantities Wine Crops export quantities Wine+Vermouth+Sim. Crops export quantities Wool, degreased Crops export quantities Wool, greasy Crops export quantities Wool, hair waste Crops export quantities Yoghurt, concentrated or not Crops export quantities
Unit tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes ... tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes
Area Year
Kuwait 2009 394.0 NaN 98.0 5608.0 98.0 2.0 534.0 0.0 3.0 NaN ... 46013.0 NaN NaN NaN NaN NaN 183.0 156.0 0.0 0.0
Guinea 1986 NaN NaN NaN 0.0 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Equatorial Guinea 1985 NaN NaN NaN 0.0 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Lebanon 2004 0.0 13.0 386.0 12777.0 386.0 18.0 55338.0 1729.0 8.0 28.0 ... 34884.0 NaN NaN NaN 1521.0 1539.0 0.0 103.0 0.0 12.0
EU(27)ex.int 1991 41665.0 3165.0 49961.0 2183425.0 49961.0 6510.0 161283.0 12154.0 1866.0 1204.0 ... 21443216.0 48316.0 105.0 48211.0 791567.0 817964.0 23989.0 5984.0 7349.0 10809.0

5 rows × 445 columns

Some Nan values have appeared. As we have datasets that have great outliers, we would replace those Nan values by the median of the column:

In [51]:
# Deal with the NaN that appeared
for df_name in df_useful :
    for column in list(df_useful[df_name]):
        if column not in ['Area', 'Year']:
            df_useful[df_name][column].fillna(df_useful[df_name][column].median(), inplace=True)

1.F.b. Merging everything

We are now creating the uniformized dataframe uni_df. Each row corresponds to one tuple (Country, Year) so that we can later group by country or year. In addition to the country ("Area") and the "Year", the columns will either be an economic feature ("GDP", "CPI") or an agricultural one (some crop harvested area, some livestock export quantity, ...)

In [52]:
uni_df = df_useful['GDP']
for df_name in need_pivot :
    uni_df = pd.merge(uni_df, df_useful[df_name], how='outer', on=['Area', 'Year'])
uni_df = pd.merge(uni_df,df_useful['Consumer price indices'], how='outer', on=['Area', 'Year'])

# Deal with the NaN that appeared
for column in list(uni_df):
    if column not in ['Area', 'Year']:
        uni_df[column].fillna(uni_df[column].median(), inplace=True)
uni_df.sample(30)
Out[52]:
Area Year (GDP, million $) (Anise, badian, fennel, coriander Crops Area harvested, ha) (Apples Crops Area harvested, ha) (Apricots Crops Area harvested, ha) (Areca nuts Crops Area harvested, ha) (Artichokes Crops Area harvested, ha) (Asparagus Crops Area harvested, ha) (Avocados Crops Area harvested, ha) ... (Whey, Pres+Concen Crops export quantities, tonnes) (Whey, condensed Crops export quantities, tonnes) (Whey, dry Crops export quantities, tonnes) (Wine Crops export quantities, tonnes) (Wine+Vermouth+Sim. Crops export quantities, tonnes) (Wool, degreased Crops export quantities, tonnes) (Wool, greasy Crops export quantities, tonnes) (Wool, hair waste Crops export quantities, tonnes) (Yoghurt, concentrated or not Crops export quantities, tonnes) (Consumer price indices, %)
4599 Kazakhstan 2010 148047.348240 3100.0 29700.0 2500.0 49401.0 20.0 2173.0 2729.0 ... 60.0 0.0 60.0 60.0 60.0 3709.0 2906.0 0.0 500.0 100.408333
738 Barbados 1972 277.578845 3100.0 21040.0 5000.0 49401.0 2800.0 2173.0 2729.0 ... 20.0 0.0 25.0 9.0 9.0 316.0 510.0 131.0 0.0 97.895927
5275 Madagascar 1996 3994.938306 20.0 4062.0 122.0 49401.0 2800.0 2173.0 2675.0 ... 20.0 0.0 25.0 1.0 1.0 316.0 510.0 131.0 1.0 97.895927
12644 Guam 1996 10084.100000 3100.0 21040.0 5000.0 49401.0 2800.0 2173.0 2729.0 ... 20.0 0.0 25.0 0.0 0.0 316.0 510.0 131.0 1.0 97.895927
14572 EU(25)ex.int 2007 10084.100000 3100.0 21040.0 5000.0 49401.0 2800.0 2173.0 2729.0 ... 396745.0 1298.0 395447.0 1795742.0 1870089.0 18890.0 43803.0 3355.0 39609.0 97.895927
11649 Afghanistan 1967 10084.100000 3100.0 3060.0 6620.0 49401.0 2800.0 2173.0 2729.0 ... 20.0 0.0 25.0 178.0 195.0 3716.0 0.0 131.0 1.0 97.895927
6165 Nauru 2012 120.602724 3100.0 21040.0 5000.0 49401.0 2800.0 2173.0 2729.0 ... 20.0 0.0 25.0 178.0 195.0 316.0 510.0 131.0 1.0 97.895927
8987 Trinidad and Tobago 1982 8140.416670 3100.0 21040.0 5000.0 49401.0 2800.0 2173.0 2729.0 ... 0.0 0.0 0.0 64.0 64.0 316.0 510.0 131.0 0.0 97.895927
8512 Suriname 2013 5145.655152 3100.0 21040.0 5000.0 49401.0 2800.0 2173.0 2729.0 ... 20.0 0.0 25.0 94.0 94.0 316.0 510.0 131.0 0.0 132.628960
7527 Saint Vincent and the Grenadines 1994 287.679153 3100.0 230.0 5000.0 49401.0 2800.0 2173.0 2729.0 ... 20.0 0.0 25.0 178.0 195.0 316.0 510.0 131.0 1.0 97.895927
12784 Israel 1965 10084.100000 3100.0 2000.0 700.0 49401.0 400.0 2173.0 310.0 ... 0.0 0.0 0.0 1550.0 1550.0 0.0 30.0 102.0 1.0 97.895927
8556 Swaziland 2011 4959.971414 3100.0 21040.0 5000.0 49401.0 2800.0 2173.0 130.0 ... 0.0 0.0 0.0 1.0 1.0 2.0 0.0 0.0 0.0 106.105198
8308 Spain 1993 523683.365880 3112.0 53700.0 23900.0 49401.0 23600.0 23100.0 7600.0 ... 524.0 12.0 512.0 974684.0 1021689.0 7998.0 6168.0 1705.0 3156.0 97.895927
12988 Martinique 1962 10084.100000 3100.0 21040.0 5000.0 49401.0 2800.0 2173.0 2729.0 ... 20.0 0.0 25.0 178.0 195.0 316.0 510.0 131.0 1.0 97.895927
13811 Southern Europe 1967 10084.100000 934.0 168900.0 34500.0 49401.0 81904.0 12726.0 62.0 ... 66.0 0.0 66.0 817274.0 880142.0 2591.0 1975.0 628.0 0.0 97.895927
6275 Netherlands Antilles (former) 1984 1167.612684 3100.0 21040.0 5000.0 49401.0 2800.0 2173.0 2729.0 ... 20.0 0.0 25.0 178.0 195.0 316.0 510.0 131.0 1.0 97.895927
2929 El Salvador 1996 10315.500000 3100.0 21040.0 5000.0 49401.0 2800.0 2173.0 4150.0 ... 20.0 0.0 25.0 1.0 1.0 316.0 0.0 131.0 1.0 97.895927
13604 Saint Pierre and Miquelon 1981 10084.100000 3100.0 21040.0 5000.0 49401.0 2800.0 2173.0 2729.0 ... 20.0 0.0 25.0 178.0 195.0 316.0 510.0 131.0 1.0 97.895927
10514 Central America 1991 385987.589361 3295.0 60855.0 394.0 49401.0 100.0 9166.0 95093.0 ... 5.0 5.0 0.0 1126.0 1421.0 0.0 0.0 0.0 0.0 97.895927
9146 Turkmenistan 2003 11265.996819 3100.0 6200.0 1700.0 49401.0 2800.0 2173.0 2729.0 ... 20.0 0.0 25.0 18.0 18.0 4200.0 500.0 131.0 1.0 97.895927
9060 Tunisia 2009 43455.757067 12923.0 22700.0 7900.0 49401.0 2150.0 2173.0 27.0 ... 0.0 0.0 25.0 5799.0 5799.0 129.0 15.0 149.0 2149.0 95.770516
7487 Saint Lucia 2000 782.177778 3100.0 21040.0 5000.0 49401.0 2800.0 2173.0 60.0 ... 20.0 0.0 25.0 35.0 35.0 316.0 510.0 131.0 1.0 77.871753
13319 Northern Europe 1969 10084.100000 345.0 33906.0 5000.0 49401.0 2800.0 1134.0 2729.0 ... 607.0 0.0 607.0 7949.0 8013.0 19339.0 21026.0 14971.0 0.0 97.895927
6329 New Caledonia 1992 2923.883446 3100.0 21040.0 5000.0 49401.0 2800.0 2173.0 2729.0 ... 20.0 0.0 25.0 0.0 0.0 316.0 510.0 131.0 1.0 97.895927
6756 Palau 2005 179.900000 3100.0 21040.0 5000.0 49401.0 2800.0 2173.0 2729.0 ... 20.0 0.0 25.0 178.0 195.0 316.0 510.0 131.0 1.0 85.350000
8654 Syria 1971 2062.564103 2066.0 8281.0 10250.0 49401.0 2800.0 2173.0 2729.0 ... 20.0 0.0 25.0 0.0 0.0 822.0 6188.0 52.0 1.0 97.895927
8748 Thailand 1973 11279.853137 3100.0 21040.0 5000.0 49401.0 2800.0 2173.0 2729.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 510.0 0.0 0.0 97.895927
11936 Brazil 1963 10084.100000 3100.0 2324.0 5000.0 49401.0 2800.0 2173.0 9500.0 ... 20.0 0.0 25.0 13.0 13.0 0.0 2880.0 396.0 0.0 97.895927
6880 Paraguay 1991 5522.347996 3100.0 103.0 5000.0 49401.0 2800.0 2173.0 2608.0 ... 20.0 0.0 25.0 178.0 195.0 0.0 510.0 131.0 1.0 97.895927
9326 Ukraine 1999 32705.564369 2080.0 245500.0 12000.0 49401.0 2800.0 2173.0 2729.0 ... 134.0 134.0 25.0 15744.0 15995.0 316.0 986.0 131.0 1.0 97.895927

30 rows × 1387 columns

2.A.a Crops and livestock production and trade

TODO

2.A.b Introducing the concept of food self-sufficiency

In this section we will present and compute the notion of food self-sufficiency.

2.A.b.i Basic idea

One may wonder how to know whether a country produce all the food it needs or not. The notion of food-self-sufficency allows to answer to this question. More formally, it is a rate that decribes how much does a country can satisfy to meet its internal consumption needs by production. It describes the extent to which a country is able to feed its population through its domestic food production. We are interested into this measure since we think it could be correlated with the economic conditions of this country.

2.A.b.ii Formula and computation

In order to compute the food self-sufficiency, we will apply the following formula that gives us the food self-sudfficiency as a percentage :

$$\frac{Production \times 100}{Production + Imports – Exports}$$
In [53]:
all_columns = list(uni_df)
production_columns = []
import_columns = []
export_columns = []
for column in all_columns:
    if (type(column)==tuple) and column[1]=='tonnes':
        if 'export quantities' in column[0]:
            export_columns.append(column)
        elif 'import quantities' in column[0]:
            import_columns.append(column)
        elif 'Production' in column[0]:
            production_columns.append(column)
            
uni_df[('All productions','tonnes')] = 0
for column in production_columns :
    uni_df[('All productions','tonnes')] += uni_df[column]            
uni_df[('All imports','tonnes')] = 0
for column in import_columns :
    uni_df[('All imports','tonnes')] += uni_df[column]            
uni_df[('All exports','tonnes')] = 0
for column in export_columns :
    uni_df[('All exports','tonnes')] += uni_df[column]
            
uni_df[('food self-sufficiency','%')] = 100 * uni_df[('All productions','tonnes')] / (uni_df[('All productions','tonnes')]+uni_df[('All imports','tonnes')]+uni_df[('All exports','tonnes')])
In [54]:
display(uni_df[['Area','Year',('food self-sufficiency','%')]].sample(5))
Area Year (food self-sufficiency, %)
3014 Eritrea 1989 91.620924
13730 Small Island Developing States 1967 72.260440
11865 Belgium-Luxembourg 1985 24.354779
10892 Western Asia 2001 63.019547
8482 Suriname 1983 89.769867
In [55]:
plot = uni_df[['Area','Year']]
plot["Value"] = uni_df[('food self-sufficiency','%')]
for year in range(1980, 2010, 5):
    display(year, visualise_world_data_folium(plot, year, False))
    
1980
1985
1990
1995
2000
2005

2.B. Consumer price indices

      1. Definition

      2. Usage

2.B.a. Definition

Consumer price indices (CPI) are a way to measure the changes of the average price level of goods. One can use it to measure inflation or deflation over some period. More technically, for a given item, the CPI is the ratio of the market basket fo two different years. Global CPI is an average of sigle item CPI with some standardized weights.

2.B.b. Usage

The CPI have many usages and is often taken into consideration. For instance it is used for budget and pensions revisions, monetary and economic policies, and economic analysis .

We will use the CPI to answer the following questions: "Are prices more stables in more self-sufficient countries ?", "Is there a link between the CPI and the agricultural features ?"

2.C. Structure of international trade and historical context

Our dataset contains data for the historical period from 1970 to 2015. In order to be able to correctly interpret the results we are going to see, we first made an historical research on this period. We shortly listed below important events of this period for which we think they have had a significant influence on the agriculture and the economy.

There was the Cold war from 1945 to 1990 with two economic superpowers (USA and USSR). The USSR had been dissoluted in 1991. The Japanese economic miracle occured from 1945 to 1990 and allowed Japan to come out of the disastrous state in which it was at the exit of the WW2 and become world's second largest economy. There has been 2 big oil crisis in 1973 and 1979. There has been many wars (Middle East wars 1973-2000 e.g. Yom Kippur War 1973, Islamic Revolution in Iran 1979, Iran–Iraq war 1980-1988, Gulf war 1990-1991, Yugoslav wars 1991-2001...). We have already seen some consequences of such events by dealing with countries names in a previous section.

The third Agricultural Revolution (also known as Green revolution) occurs form 1960 to 1990 and imporved agricultural productions thanks to fertilizers and chemicals.

The following public-domain image from Wikimedia represents developed countries (blue), developing ones (orange) and least developed ones (red) according to the United Nations and International Monetary Fund. We expect to see similar results with our dataset (GDP).

The following image, also from Wikimedia shows the cumulative commercial balance for the period 1980-2008. We also expect to see similar results with our dataset, but there might be difference as we focus on agriculture.

In order to have an idea of the international trade and economy structure, we are interested in GDP:

In [56]:
pivoted_GDP_df = uni_df[['Area','Year']]
pivoted_GDP_df["GDP"] = uni_df["(GDP, million $)"]
pivoted_GDP_df = pivoted_GDP_df.pivot_table(index='Year', columns='Area', values="GDP").dropna(axis=1)
In [57]:
pivoted_GDP_df.sample(5)
Out[57]:
Area Afghanistan Albania Algeria Angola Argentina Aruba Australia Austria Bahrain Bangladesh ... Turkey Uganda United Arab Emirates United Kingdom United Republic of Tanzania United States of America Uruguay Vietnam Zambia Zimbabwe
Year
1988 2664.299991 2460.399974 58655.419182 10627.491744 137512.093210 596.423607 287358.07903 133012.263430 4173.233306 24814.782417 ... 122128.850050 6908.911541 36275.613877 9.101710e+05 6723.160380 5252600.0 8395.984290 5786.496555 3470.831506 10461.551624
1983 3399.439011 2305.384485 48702.201863 6975.014205 94190.653606 346.964735 192185.69133 71944.723076 4577.031528 15760.530887 ... 82911.912652 4667.067724 42803.373401 4.895998e+05 11504.452882 3638100.0 5635.610957 3288.742278 3992.301827 8333.937797
2001 3620.525250 3926.887597 55056.733057 11948.601472 291334.718270 1920.265652 389661.44584 196951.866200 8976.207763 45432.923187 ... 196007.195540 7044.405033 103311.631800 1.613139e+06 13586.831298 10621824.0 20898.758301 32685.198809 4094.451991 7495.469268
2002 4285.170218 4355.865889 56819.210891 15285.592487 110583.590320 1941.094724 435040.91820 212980.372720 9593.509106 47194.755390 ... 232530.484890 7410.282948 109816.091080 1.757510e+06 14140.007894 10977514.0 13606.503907 35064.105501 4193.839169 7111.380000
1984 3360.445864 2275.736342 52827.902259 7393.603048 99490.062045 365.460785 206143.08827 67820.785288 4785.937614 18474.003543 ... 80641.275016 4609.483042 41807.931457 4.614830e+05 10568.330554 4040700.0 5355.153238 5068.648136 3194.055408 6818.391126

5 rows × 146 columns

As we can see on a subset of the correlation matrix below, GDP are often hugely correlated between countries.

In [58]:
selected_countries = ['Algeria', 'Australia', 'Austria', 'Bangladesh', 'China',
                      'Djibouti', 'France', 'Germany', 'India', 'Japan', 'Mali',
                      'Switzerland', 'United States of America']

corr = pivoted_GDP_df[selected_countries].corr()
corr.style.background_gradient(cmap='coolwarm')
Out[58]:
Area Algeria Australia Austria Bangladesh China Djibouti France Germany India Japan Mali Switzerland United States of America
Area
Algeria 1 0.980818 0.922843 0.938941 0.927377 -0.342829 0.912461 0.887093 0.972868 0.768559 0.516948 0.948838 0.909623
Australia 0.980818 1 0.951373 0.953673 0.935553 -0.349658 0.938023 0.919684 0.984724 0.816869 0.520149 0.973969 0.937551
Austria 0.922843 0.951373 1 0.895146 0.825707 -0.437165 0.997136 0.992467 0.913345 0.929508 0.402053 0.988033 0.982963
Bangladesh 0.938941 0.953673 0.895146 1 0.982827 -0.249451 0.87078 0.858809 0.985178 0.738215 0.593815 0.938569 0.902974
China 0.927377 0.935553 0.825707 0.982827 1 -0.195531 0.796355 0.777185 0.977373 0.632289 0.62787 0.884897 0.829639
Djibouti -0.342829 -0.349658 -0.437165 -0.249451 -0.195531 1 -0.491127 -0.510712 -0.297009 -0.529611 0.611626 -0.431398 -0.476802
France 0.912461 0.938023 0.997136 0.87078 0.796355 -0.491127 1 0.995472 0.893948 0.937746 0.341338 0.981299 0.981912
Germany 0.887093 0.919684 0.992467 0.858809 0.777185 -0.510712 0.995472 1 0.873454 0.958066 0.307788 0.978105 0.977361
India 0.972868 0.984724 0.913345 0.985178 0.977373 -0.297009 0.893948 0.873454 1 0.748194 0.567755 0.951028 0.911338
Japan 0.768559 0.816869 0.929508 0.738215 0.632289 -0.529611 0.937746 0.958066 0.748194 1 0.189758 0.909274 0.923719
Mali 0.516948 0.520149 0.402053 0.593815 0.62787 0.611626 0.341338 0.307788 0.567755 0.189758 1 0.429605 0.355082
Switzerland 0.948838 0.973969 0.988033 0.938569 0.884897 -0.431398 0.981299 0.978105 0.951028 0.909274 0.429605 1 0.977426
United States of America 0.909623 0.937551 0.982963 0.902974 0.829639 -0.476802 0.981912 0.977361 0.911338 0.923719 0.355082 0.977426 1

The correlation matrix contains lots of values that are very closed to one (red). This is also true for the whole correlation matrix as seen below:

In [59]:
f = plt.figure(figsize=(19, 15))
plt.matshow(pivoted_GDP_df.corr(), fignum=f.number)
cb = plt.colorbar()
cb.ax.tick_params()
plt.title('Correlation Matrix', fontsize=16);

We then try to clusterize this correlation matrix in order to find countries whose GDP are correlated:

In [60]:
corr = pivoted_GDP_df.corr().values
pdist = spc.distance.pdist(corr)   # vector of ('55' choose 2) pairwise distances
linkage = spc.linkage(pdist, method='complete')
ind = spc.fcluster(linkage, 0.5*pdist.max(), 'distance')
columns = [pivoted_GDP_df.columns.tolist()[i] for i in list((np.argsort(ind)))]
clusterised_df = pivoted_GDP_df.reindex(columns, axis=1)

f = plt.figure(figsize=(19, 15))
plt.matshow(clusterised_df.corr(), fignum=f.number)
cb = plt.colorbar()
cb.ax.tick_params()
plt.title('Correlation Matrix', fontsize=16);

We have found 4 regions in which the GDP is highly correlated and between which the correlation coefficent is lower. We could refine the big clusters by iterating this method.

Interpretation: The correlation matrix of GDP contains lots of values that are very closed to one. This means that GDP in two different countries have a trend to evolve the same way. Therefore, we can say that the world countries have strong enough trading relations to make the GDP evolve the same way. The fact that we have found some main clusters could be interpreted as regions in which the trading relations are more important.

2.D. Economic classification of countries

In [61]:
plot = uni_df[['Area','Year']]
plot["Value"] = uni_df["(GDP, million $)"]
for year in range(1980, 2015, 5):
    display(year, visualise_world_data_folium(plot, year, True))
1980
1985
1990
1995
2000
2005
2010

4. Informed plan for next actions

Our results seem pretty intersting to share to the world. Moreover we have nice interactive maps and we would like to focus more on visual and style than writting on methodology. Therefore, we would like to produce a data story.